Predicting bioactivity of compound-drug target protein pairs using support vector regression models reflecting ligand efficiency
نویسنده
چکیده
Predicting bioactivity of compounds to drug target proteins using machine learning met hods is one of the most intensively studied area in drug discovery and development. Although many previous machine learning studies have succeeded in predicting novel ligand -protein interactions with high performance, all of the previous studies to date have been heavily dependent on the simple use of raw bioactivity data of ligand potencies measured by IC50, EC50, Ki, and Kd deposited in databases. In our previous study [1], we have showed that, using support vector machines, binary classification models based on training data reflecting one of the representatives of ligand efficiency, Binding Efficiency Index (BEI) [2] can offer better performance in classifying active and inactive compound-protein pairs than models based on training data reflecting IC50 or Ki. In this study, we report that this result holds also when support vector regression (SVR) models are applied to bioactivity data. Utilizing bioactivity data measured by IC50 in GPCRSARfari ver. 2, KinaseSARfari ver. 4, and ChEMBL 14 databases [3], we retrieved bioactivity data associated with G protein-coupled receptors, protein kinases, and ion channels and created four types of training data; IC50 -based, pIC50-based, BEI-based, and Surface Efficiency Index (SEI)-based. Values of pIC50, BEI, and SEI were transferred from observed values of IC50 in the databases. The number of instances in the training data is shown in Table 1. To represent compound-protein pairs in the training data, three kinds of compound descriptors (MACCS, 2D descriptors in MOE, and OpenBabel FP2) and single protein descriptor (frequencies of dimmers of amino acid in protein sequence) were used. From GPCRSARfari ver. 3, KinaseSARfari ver. 5.01, and ChEMBL 15 databases, we collected newly added bioactivity data and used the data as validation data for evaluating the performance of the constructed SVR models. Objective comparisons of the performance of the SVR models showed that their prediction capabilities follow an order of SEI > BEI > pIC50 > IC50 as a whole. This result is independent of compound descriptors used and drug target protein families. The superiority of ligand efficiency-based SVR models may be partially attributed to distinct distribution patterns of pIC50s, BEIs, and SEIs, showing narrower range of BEIs than pIC50 s and SEIs than BEIs.
منابع مشابه
Pred-binding: large-scale protein-ligand binding affinity prediction.
Drug target interactions (DTIs) are crucial in pharmacology and drug discovery. Presently, experimental determination of compound-protein interactions remains challenging because of funding investment and difficulties of purifying proteins. In this study, we proposed two in silico models based on support vector machine (SVM) and random forest (RF), using 1589 molecular descriptors and 1080 prot...
متن کاملComparison of the efficiency of data mining methods in predicting type 2 diabetes
Background: Diabetes mellitus as a chronic disease is the most common disease caused by metabolic disorders and it is one of the most important health issues all around the world. Nowadays, data mining methods are applied in different fields of sciences due to data mining methods capability. Therefore, in this study, we compared the efficiency of data mining methods in predicting type 2 diabete...
متن کاملEvaluation of the Efficiency of Linear and Nonlinear Models in Predicting Monthly Rainfall (Case Study: Hamedan Province)
In this research, we used the support vector machine (SVM), support vector machine combine with wavelet transform (W-SVM), ARMAX and ARIMA models to predict the monthly values of precipitation. The study considers monthly time series data for precipitation stations located in Hamedan province during a 25-year period (1998-2016). The 25-year simulation period was divided into 17 years for t...
متن کاملDrug Design for Neuropathic Pain Regulation from Traditional Chinese Medicine
FAAH-like anandamide transporter (FLAT) regulates anandamide transport for hydrolysis and may be an attractive drug target for pain regulation. We aimed to discover potential FLAT antagonists from traditional Chinese medicine (TCM) using virtual screening, ligand-based drug design and molecular dynamics simulation (MD). Guineensine and Retrofractamide A exhibited high Dock Scores in FLAT. Conse...
متن کاملPredicting the Young\'s Modulus and Uniaxial Compressive Strength of a typical limestone using the Principal Component Regression and Particle Swarm Optimization
In geotechnical engineering, rock mechanics and engineering geology, depending on the project design, uniaxial strength and static Youngchr('39')s modulus of rocks are of vital importance. The direct determination of the aforementioned parameters in the laboratory, however, requires intact and high-quality cores and preparation of their specimens have some limitations. Moreover, performing thes...
متن کامل